Architecture for supporting Hardware Collectives in Output-Queued High-Radix Routers

نویسندگان

  • Sameer Kumar
  • Laxmikant V. Kalé
  • Craig Stunkel
چکیده

Collective communication performance is critical for many applications. In this paper, we present an architecture to efficiently support collective operations (like multicasts and reductions) in the switches of parallel computer interconnects. We present an output queuing switch architecture with cross-point buffering. Output queuing architectures have been less popular in the past as they require more internal speedup and buffering. However, with current technology it is straightforward to build output-queued switches. We demonstrate in this paper that output-queued architectures make multicasts and reductions fairly easy to implement efficiently. We show the scalability of our schemes to a large number of switch ports. We present performance of multicasts and reductions on individual switches and networks of switches. We assume a fat-tree topology for the networks of switches. We also present simulation results based on synthetic workloads that emulate a molecular dynamics application.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

1 Architectures of Internet Switches and Routers

Over the years, different architectures have been investigated for the design and implementation of high-performance switches. Particular architectures were determined by a number of factors based on performance, flexibility and available technology. Design differences were mainly a variation in the queuing functions and the switch core. The crossbar-based architecture is perhaps the dominant a...

متن کامل

Input-queued router architectures exploiting cell-based switching fabrics

Input queued and combined input/output-queued architectures have recently come to play a major role in the design of high-performance switches and routers for packet networks. These architectures must be controlled by a packet scheduling algorithm, which solves contentions in the transfer of data units to switch outputs. Several scheduling algorithms were proposed in the literature for switches...

متن کامل

A Practical Scheduling Algorithm to Achieve 100% Throughput in Input-Queued Switches

Input queueing is becoming increasingly used for high-bandwidth switches and routers. In previous work, it was proved that it is possible to achieve 100% throughput for input-queued switches using a combination of virtual output queueing and a scheduling algorithm called LQF. However, this is only a theoretical result: LQF is too complex to implement in hardware. In this paper we introduce a ne...

متن کامل

Buffer Sizing in a Combined Input Output Queued (CIOQ) Switch

In all internet routers buffers are needed to hold packets during times of congestion. In some recent work, the question of finding the minimum buffer size guaranteeing high throughput has been addressed [3] [6]. The answer to this question is particularly important in building all-optical routers, where the optical technology allows buffering up to a few dozen packets [7]. While in practice mo...

متن کامل

A Practical Scheduler For High-Speed Packet Switches and Internet Routers

The input queued (IQ) crossbar based switching, employing virtual output queueing (VOQ), is the dominant architecture for high-performance packet switches. The performance of a VOQ switch depends solely on the scheduling algorithm used. Maximum Weight Matching (MWM) algorithms have optimal performance however they are not practical due to their hardware complexity. Round Robin (RR) based algori...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005